Search results for "Distributed database"
showing 10 items of 23 documents
Rings for Privacy: an Architecture for Large Scale Privacy-Preserving Data Mining
2021
This article proposes a new architecture for privacy-preserving data mining based on Multi Party Computation (MPC) and secure sums. While traditional MPC approaches rely on a small number of aggregation peers replacing a centralized trusted entity, the current study puts forth a distributed solution that involves all data sources in the aggregation process, with the help of a single server for storing intermediate results. A large-scale scenario is examined and the possibility that data become inaccessible during the aggregation process is considered, a possibility that traditional schemes often neglect. Here, it is explicitly examined, as it might be provoked by intermittent network connec…
ImageRover: A Content-Based Image Browser for the World Wide Web
1997
ImageRover is a search-by-image-content navigation tool for the World Wide Web (WWW). To gather images expediently, the image collection subsystem utilizes a distributed fleet of WWW robots running on different computers. The image robots gather information about the images they find, computing the appropriate image decompositions and indices, and store this extracted information in vector form for searches based on image content. At search time, users can iteratively guide the search through the selection of relevant examples. Search performance is made efficient through the use of an approximate, optimized k-d tree algorithm. The system employs a novel relevance feedback algorithm that se…
Streamlining distributed Deep Learning I/O with ad hoc file systems
2021
With evolving techniques to parallelize Deep Learning (DL) and the growing amount of training data and model complexity, High-Performance Computing (HPC) has become increasingly important for machine learning engineers. Although many compute clusters already use learning accelerators or GPUs, HPC storage systems are not suitable for the I/O requirements of DL workflows. Therefore, users typically copy the whole training data to the worker nodes or distribute partitions. Because DL depends on randomized input data, prior work stated that partitioning impacts DL accuracy. Their solutions focused mainly on training I/O performance on a high-speed network but did not cover the data stage-in pro…
Collaborative Assessment of Information Provider's Reliability and Expertise Using Subjective Logic
2011
QA each user can individually estimate the expertise and the reliability of her peers using her direct interactions with them and our framework. The online SN (OSN), which can be considered as a distributed database, performs continuous data aggregation for users expertise and reliability assessment in order to reach a consensus. We emulate a Q&A SN to examine various performance aspects of our algorithm (e.g., convergence time, responsiveness etc.). Our evaluations indicate that it can accurately assess the reliability and the expertise of a user with a small number of samples and can successfully react to the latter's behavior change, provided that the cognitive traits hold in practice.
Object Clustering Methods and a Query Decomposition Strategy for Distributed Object-Based Information Systems
1999
Emerging developments and advances in distributed processing have created a need for tools and methods to partition and distribute information systems across interconnected processors. In particular, distribution approaches which take into account the key characteristics of OO concepts are required to extend traditional fragmentation results to object oriented database systems. To fulfill the above requirements, we propose a methodology for the distribution design of object-based information systems. The underlying approach consists of techniques and heuristics that can be used to create clusters of inter-related object classes that can be fragmented interdependently, producing distribution…
Accelerating data queries on Hadoop framework by using compact data formats
2016
There are massive amounts of data generated from IoT, online transactions, click streams, emails, logs, posts, social networking interactions, sensors, mobile phones and their applications etc. The question is where and how to store these data in order to provide faster data access. Understanding and handling Big Data is a big challenge. The research direction in Big Data projects using Hadoop Technology, MapReduce kind of framework and compact data formats such as RCFile, SequenceFile, ORC, Avro, Parquet shows that only two data formats (Avro and Parquet) support schema evolution and compression in order to utilize less storage space. In this paper, file formats like Avro and Parquet are c…
Infiniviz: Taking Quake 3 Arena on a Large-Scale Display System to the Next Level
2018
The authors of this paper have previously presented a large-scale display system called Infiniviz in other publications. Infiniviz attempts to improve network bandwidth consumption and computational performance compared to other existing large-scale display systems. Since the previous publications have been made in early development stages of Infiniviz, only the overview of the software architecture and details of hardware implementation have been presented so far. This paper contains a real-life test of Infiniviz running Quake 3 Arena at a resolution of 9600 x 5400 at 24 fps. Also, in this paper, the authors have tried to match their results to what has been published by other researchers …
HybridS: A Scheme for Secure Distributed Data Storage in WSNs
2008
In unattended wireless sensor networks (WSNs), data is stored locally or at designated nodes upon sensing, and users can access it on demand. This paradigm can improve energy efficiency by making use of the upcoming cheap and large flash memory, as well as system robustness. Nevertheless, the security and dependability of distributed storage are critical for the applicability of such WSNs. In this paper, we propose a secure and dependable data storage scheme by taking advantages of secret sharing and Reed-Solomon code, which has computational security and yet maintains optimal data size. The extensive analysis verifies our scheme can provide secure and dependable data storage in WSNs in the…
GPCALMA, a mammographic CAD in a GRID connection
2003
Purpose of this work is the development of an automatic system which could be useful for radiologists in the investigation of breast cancer. A breast neoplasia is often marked by the presence of microcalcifications and massive lesions in the mammogram: hence the need for tools able to recognize such lesions at an early stage. GPCALMA (Grid Platform Computer Assisted Library for MAmmography), a collaboration among italian physicists and radiologists, has built a large distributed database of digitized mammographic images (at this moment about 5500 images corresponding to 1650 patients). This collaboration has developed a CAD (Computer Aided Detection) system which, installed in an integrated…
Distributed medical images analysis on a Grid infrastructure
2007
In this paper medical applications on a Grid infrastructure, the MAGIC-5 Project, are presented and discussed. MAGIC-5 aims at developing Computer Aided Detection (CADe) software for the analysis of medical images on distributed databases by means of GRID Services. The use of automated systems for analyzing medical images improves radiologists’ performance; in addition, it could be of paramount importance in screening programs, due to the huge amount of data to check and the cost of related manpower. The need for acquiring and analyzing data stored in different locations requires the use of Grid Services for the management of distributed computing resources and data. Grid technologies allow…